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Abstract 

Preference queries are relational algebra or SQL queries that contain occurrences 
of the winnow operator (find the most preferred tuples in a given relation). Such 
queries are parameterized by specific preference relations. Semantic optimization 
techniques make use of integrity constraints holding in the database. In the con- 
text of semantic optimization of preference queries, we identify two fundamental 
properties: containment of preference relations relative to integrity constraints and 
satisfaction of order axioms relative to integrity constraints. We show numerous 
applications of those notions to preference query evaluation and optimization. As 
integrity constraints, we consider constraint- generating dependencies, a class gener- 
alizing functional dependencies. We demonstrate that the problems of containment 
and satisfaction of order axioms can be captured as specific instances of constraint- 
generating dependency entailment. This makes it possible to formulate necessary 
and sufficient conditions for the applicability of our techniques as constraint validity 
problems. We characterize the computational complexity of such problems. 

Key words: preference queries, query optimization, query evaluation, integrity 
constraints 



1 Introduction 



The notion of preference is becoming more and more ubiquitous in present-day 
information systems. Preferences are primarily used to filter and personalize 
the information reaching the users of such systems. In database systems, pref- 
erences are usually captured as preference relations that are used to build pref- 
[1;M 0; i^l • From a formal point of view, preference relations 



erence queries 
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are simply binary relations defined on query answers. Such relations provide an 
abstract, generic way to talk about a variety of concepts like priority, impor- 
tance, relevance, timeliness, reliability etc. Preference relations can be defined 
using logical formulas 0, 0] or special preference constructors [l^ (preference 
constructors can be expressed using logical formulas). The embedding of pref- 
erence relations into relational query languages is typically provided through a 
relational operator that selects from its argument relation the set of the most 
preferred tuples, according to a given preference relation. This operator has 
been variously called winnow (the term we use here) jl, |^, BMO y^], and 
Best ji^. It is also implicit in skyline queries [1]. Being a relational operator, 
winnow can clearly be combined with other relational operators, in order to 
express complex preference queries. 

Example 1 We introduce an example used throughout the paper. Consider 
the relation Book{ISBN, Vendor, Price) and the following preference relation 
y-Ci between Book tuples: 

prefer one Book tuple to another if and only if their ISBNs are the same 
and the Price of the first is lower. 

Consider the instance ri of Book in Figure 1. Then the winnow operator 
returns the set of tuples in Figure 2. 



ISBN 


Vendor 


Price 


0679726691 


BooksForLess 


$14.75 


0679726691 


LowestPrices 


$13.50 


0679726691 


QualityBooks 


$18.80 


0062059041 


BooksForLess 


$7.30 


0374164770 


LowestPrices 


$21.88 


Figure ] 


. The Book relation 
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Figure 2. The result of winnow 



Example 2 The above example is a one- dimensional skyline query. To see an 
example of a two-dimensional skyline, consider the schema of Book expanded 
by another attribute Rating. Define the following preference relation >~C2- 
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prefer one Book tuple to another if and only if their ISBNs are the same 
and the Price of the first is lower and the Rating of the first is not lower, or 
the Price of the first is not higher and the Rating of the first is higher. 

Then uoc^ is equivalent to the following skyline (in the terminology of ^'5j): 

SKYLINE ISBN DIFF, Price MIN, Rating MAX. 

The above notation indicates that only books with the same ISBN should be 
compared, that Price should be minimized, and Rating maximized. In fact, the 
tuples in the skyline satisfy the property of Pareto-optimality, well known in 
economics. 



Preference queries can be reformulated in relational algebra or SQL, and thus 
optimized and evaluated using standard relational techniques. However, it 
has been recognized that specialized evaluation and optimization techniques 
promise in this context performance improvements that are otherwise unavail- 
able. A number of new algorithms for the evaluation of skyline queries (a spe- 
cial class of preference queries) have been proposed 0,11, [H, H, 2^^- Some 



of them can be used to evaluate more general preference queries j2|,|9|. Also, 
algebraic laws that characterize the interaction of winnow with the standard 
operators of relational algebra have been formulated |2l[ 2^1- Such laws 



provide a foundation for the rewriting of preference queries. For instance, 
necessary and sufficient conditions for pushing a selection through winnow are 
described in The algebraic laws cannot be applied unconditionally. In fact, 
the preconditions of their applications refer to the validity of certain constraint 
formulas. 

In this paper, we pursue the line of research from pi] a bit further. We study 
semantic optimization of preference queries. Semantic query optimization has 
been extensively studied for relational and deductive databases |6]. As a result, 
a body of techniques dealing with specific query transformations like join 
elimination and introduction, predicate introduction etc. has been developed. 
We view semantic query optimization very broadly and classify as semantic 
any query optimization technique that makes use of integrity constraints. In 
the context of semantic optimization of preference queries, we identify two 
fundamental semantic properties: containment of preference relations relative 
to integrity constraints and satisfaction of order axioms relative to integrity 
constraints. We show that those notions make it possible to formulate semantic 
query optimization techniques for preference queries in a uniform way. 

We focus on the winnow operator. Despite the presence of specialized evalua- 
tion techniques, winnow, being essentially an anti-join, is still quite an expen- 
sive operation. We develop optimizing techniques that: 

(1) remove redundant occurrences of winnow; 
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(2) coalesce consecutive applications of winnow; 

(3) recognize when more efficient evaluation of winnow is possible. 

More efficient evaluation of winnow can be achieved, for example, if the given 
preference relation is a weak order (a negatively transitive strict partial order). 
We show that even when the preference relation is not a weak order (as in 
Example 1), it may become a weak order on the relations satisfying certain 
integrity constraints. We show a very simple, single-pass algorithm for eval- 
uating winnow under those conditions. We also pay attention to the issue of 
satisfaction of integrity constraints in the result of applying winnow. In fact, 
some integrity constraints may hold in the result of winnow, even though they 
do not hold in the relation to which winnow is applied. Combined with known 
results about the preservation of integrity constraints by relational algebra 
operators ji^, Ull, our results provide a way for optimizing not only single 
occurrences of winnow but also complex preference queries. 

As integrity constraints, we consider constraint- generating dependencies a 
class generalizing functional dependencies. Constraint-generating dependen- 
cies seem particularly well matched with preference queries, since both the 
former and the latter are formulated using constraints. We demonstrate that 
the problems of containment of preference relations and satisfaction of order 
axioms, relative to integrity constraints, can be captured as specific instances 
of dependency entailment. Our approach makes it possible to formulate nec- 
essary and sufficient conditions for the applicability of the proposed semantic 
query optimization techniques as constraint validity problems and precisely 
characterize the computational complexity of such problems, partly adopting 
some of the results of 0]. 

The plan of the paper is as follows. In Section 2, we provide background mate- 
rial on preference queries and constraint-generating dependencies. In Section 3, 
we introduce two basic semantic properties: relative containment and relative 
satisfaction of order axioms. In Section 4, we address the issue of eliminat- 
ing redundant occurrences of winnow. In Section 5, we study weak orders. In 
Section 6, we characterize dependencies holding in the result of winnow. In 
Section 7, we consider the computational complexity of the semantic proper- 
ties studied in the present paper. We discuss related work in Section 8, and 
conclude in Section 9. 



2 Basic notions 

We are working in the context of the relational model of data. For concreteness, 
we consider two infinite domains: D (uninterpreted constants) and Q (rational 
numbers). Other domains could be considered as well without influencing most 
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of the results of the paper. We assume that database instances are finite. 
Additionally, we have the standard built-in predicates. We refer to relation 
attributes using their names or positions. 

We define constraints to be quantifier-free formulas over some signature of 
built-in operators, interpreted over a fixed domain - in our case D or Q. We 
will allow both atomic- and tuple-valued variables in constraints. The notation 
t[X] denotes the fragment of a tuple t consisting of the values of the attributes 
in the set X. 

2.1 Preference relations 

Definition 1 Given a relation schema R{Ai ■ ■ ■ Ak) such that Ui, 1 < i < k, 
is the domain ( either D or Q) of the attribute Ai, a relation >~ is a preference 
relation over R if it is a subset of {Ui x • • ■ x Uk) x {Ui x • • ■ x Uk). 

Intuitively, >- will be a binary relation between tuples from the same (database) 
relation. We say that a tuple ti dominates a tuple t2 in if ^2- 

Typical properties of the relation >- include: 

• irreflexivity: \/x. x )/- x, 

• asymmetry: Wx, y.x)~y=>y)/-x, 

• transitivity. Wx, y, z. {x y y A y >~ z) ^ x )^ z, 

• negative transitivity: \/x, y,z. {x y /\ y z) =^ x z, 

• connectivity: \/x, y.xyyVyyxVx — y. 

The relation >- is: 

• a strict partial order if it is irrefiexive and transitive (thus also asymmetric); 

• a weak order if it is a negatively transitive strict partial order; 

• a total order if it is a connected strict partial order. 

At this point, we do not assume any properties of y, although in most appli- 
cations it will satisfy at least the properties of strict partial order. 

Definition 2 A preference formula (pf) C{ti,t2) is a first-order formula defin- 
ing a preference relation the standard sense, namely 

ti yc t2 iff C{ti,t2). 

An intrinsic preference formula (ipf) is a preference formula that uses only 
built-in predicates. 

We will limit our attention to preference relations defined using intrinsic pref- 



5 



erence formulas. Most preference relations of this form. Moreover, for intrinsic 
preference relations testing a pair of tuples for dominance can be entirely done 
on the basis of the contents of those tuples; no database queries need to be 
evaluated. 

Because we consider two specific domains, D and Q, we will have two kinds 
of variables, D-variables and Q-variables, and two kinds of atomic formulas: 

• equality constraints: x = y, x y, x = c, or x c, where x and y are 
D-variables, and c is an uninterpreted constant; 

• rational- order constraints: x9y or x6c, where {=,7^, <,>,<,>}, x and 
y are Q-variables, and c is a rational number. 

Without loss of generality, we will assume that ipfs are in DNF (Disjunctive 
Normal Form) and quantifier-free (the theories involving the above domains 
admit quantifier elimination). We also assume that atomic formulas are closed 
under negation (also satisfied by the above theories). An ipf whose all atomic 
formulas are equality (resp. rational-order) constraints will be called an equal- 
ity (resp. rational- order) ipf. If both equality and rational-order constraints 
are allowed in a formula, the formula will be called equality /rational- order. 
Clearly, ipfs are a special case of general constraints j28| , and define fixed, al- 
though possibly infinite, relations. By using the notation for a preference 
relation, we assume that there is an underlying preference formula C. 

Every preference relation generates an indifference relation r^c'- two tuples 
ti and t2 are indifferent {ti ~c' ^2) if neither is preferred to the other one, i.e., 
ti i-c t2 and t2 i-c h- 

Proposition 1 For every preference relation >~c, every relation r and every 
tuple ti,t2 E ujc{r), we have ti = t2 or ti r^c h- 

Complex preference relations can be easily defined using Boolean connectives. 
Here we define a special operator: prioritized composition. The prioritized 
composition >~Ci > >~C2 has the following intuitive reading: prefer according 
to >~C2 unless >~Ci is applicable. 

Definition 3 Consider two preference relations >~c'i o-nd >~C2 defined over the 
same schema R. The prioritized composition >~Ci,2 = ^Ci > >~C2 of >~Ci and 
>~C2 '^s a preference relation over R defined as: 

ti ^Ci,2 h = ti ^Ci h V (ti ^ci h A ti )^C2 h)- 
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^.S Winnow 

We define now an algebraic operator that picks from a given relation the set 
of the most preferred tuples, according to a given preference formula. 

Definition 4 If R is a relation schema and C a preference formula defining a 
preference relation over R, then the winnow operator is written as uc{R), 
and for every instance r of R: 

ujc{r) = {t e r I ^3t' e r. t' t}. 

A preference query is a relational algebra query containing at least one occur- 
rence of the winnow operator. 

Example 3 Consider the relation Book{ISBN, Vendor, Price) (Example 1). 
The preference relation )^Ci from this example can he defined using the rational- 
order ipf Ci: 

{i,v,p) {i',v',p') = i^i' Ap<p'. 

The answer to the preference query uJci{Book) provides for every book the 
information about the vendors offering the lowest price for that book. Note 
that the preference relation )^Ci is a strict partial order. 

Example 4 To see another kind of preferences, consider the following pref- 
erence relation ycs-' 

I prefer Warsaw to any other city and prefer any city to Moscow. 

This preference relation can be formulated as an equality ipf C^: 

X y = X — 'Warsaw' Ay ^ 'Warsaw' V x ^ 'Moscow' Ay — 'Moscow'. 



2.3 Constraint-generating dependencies 

We assume that we are working in the context of a single relation schema R 
and all the integrity constraints are over that schema. The set of all instances 
of R satisfying a set of integrity constraints F is denoted as Sat{F). We say 
that F entails an integrity constraint /, written F h /, if every instance 
satisfying F also satisfies /. 

Remember that constraints are arbitrary quantifier-free formulas over some 
constraint theory - here D or Q. 



7 



Definition 5 ^ constraint-generating dependency (CGD) can be expressed 
a formula of the following form: 

Vti. . . . V4. R{h) A ■ ■ ■ A R{tk) A 7(ti, . . . tfc) ^ iih, ... 4) 

where 7(^1, ■ ■ - tk) and 7'(ti, ■ ■ - tk) are constraints. Such a dependency is called 
a k-dependency. 

CGDs are equivalent to denial constraints. Functional dependencies (FDs) are 
2-CGDs, because a functional dependency (FD) f = X ^ Y , where X and Y 
are sets of attributes of R, can be written down as the following logic formula: 

Vti.Vta. R{ti) A R{t2) A ti[X] = t2[X] ti[Y] = t2[Y]. 

Note that the set of attributes X in X Y may be empty, meaning that 
each attribute in Y can assume only a single value. 

Example 5 We give here further examples of CGDs. Consider the relation 
Emp with attributes Name, Salary, and Manager, with Name being the primary 
key. The constraint that no employee can have a salary greater that that of her 
manager is a CGD: 

Vn, s, m, s', m' . Emp{n, s, m) A Emp{m, s' , m') =^ s < s' . 

Similarly, single-tuple constraints (ICHECK constraints in SQL2) are a special 
case of CGDs. For example, the constraint that no employee can have a salary 
over $200000 is expressed as: 

Vn, s, m. Emp{n, s,m) ^ s < 200000. 



The paper contains an effective reduction using symmetrization from the 
entailment of CGDs to the validity of V-formulas (or, equivalently, to the 
unsatisfiability of quantifier-free formulas) in the underlying constraint theory. 
This reduction is descibed in Section 7. A similar construction using symbol 
mappings is presented in jj^ . 



3 Properties relative to integrity constraints 



We define here two properties fundamental to semantic query optimization 
of preference queries: containment of preference relations and satisfaction of 
order axioms. 

Definition 6 A preference relation >~Ci over a schema R is contained in a 
preference relation >~C2 over the same schema relative to a set of integrity 
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constraints F, written as )^Ci Qf ^/ 



Vr e Sat{F). Wti, t2 e r. ti >-Ci ^2 ^ >-C2 h- 



Clearly, )^Ci Qf F \- d^ 



fCl,C2 

'0 



, where 



d 



fCl,C2 . 
'0 



Satisfaction of order axioms relative to integrity constraints is defined similarly 
- by relativizing the universal quantifiers in the axioms. Since in this paper 
we are interested in strict partial and weak orders, we define the following. 

Definition 7 A preference >~c over a schema R is a strict partial order rel- 
ative to a set of integrity constraints F if: 



Definition 8 A preference over a schema R is a weak order relative to 
a set of integrity constraints F if it is a strict partial order relative to F and 

Vr e Sat{F). yti,t2, t^ e r. ti ^2 A t2 ts =^ ti h- 

Again, it is clear that the above properties can be expressed in terms of the 
entailment of CCDs. 



4 Eliminating redundant occurrences of winnow 

We consider here two situations in which an occurrence of winnow in a prefer- 
ence query may be eliminated. The first case is that when a single application 
of winnow does not remove any tuples, and is thus redundant. The second 
case is more subtle: the interaction between two consecutive applications of 
winnow is such that one can be eliminated. 

Given an instance r of i?, the operator ojc is redundant if a;c(r) = r. If we 
consider the class of all instances of f?, then such an operator is redundant for 
every instance iff is an empty relation. The latter holds iff C is unsatisfi- 
able. However, we are interested only in the instances satisfying a given set of 
integrity constraints. 



Vr e Sat{F). e r. t t 

Vr e Sat{F). Vti, ^2, ^3 e r. ti h A ^2 ^3 ^ ^3- 
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Definition 9 Given a set of integrity constraints F, the operator toc is re- 
dundant relative to a set of integrity constraints F if\fr G Sat{F), ciJc'(r) = r. 



Theorem 1 uc is redundant relative to a set of FDs F iff ^ False 

where 

h y False t2 = FalsC. 

Proof. Assume ti,t2 £ r for some r G Sat{F) and ti >~c h- Tlien t2 ^ ujc{r). 
In the other direction, assume that for some r G Sat{F), ci;c(r) C r. Thus, 
there must he ti,t2 & r such that ti >~c h- ■ 

Clearly, uc is redundant relative to F iff F entails the following CGD: 



: Vti,t2. Rih) A R{t2) ti i-c t2- 



The CGD always holds in the result of winnow uJc{R) and simply says that 
all the tuples in this result are mutually indifferent. 

Example 6 Consider Example 3 in which the FD ISBN Price holds, uoci 
is redundant relative to ISBN Price because this dependency entails (is, 
in fact, equivalent to ) the dependency 

\/ii,Vi,pi,i2,V2,P2- Book{ii,Vi,pi) A Book{i2,V2,P2) ^ ii 7^ «2 V pi > p2. 



The second case where an occurrence of winnow can be eliminated is as follows. 

Theorem 2 Assume F is a set of integrity constraints over a schema R. If 
y-Ci 0,'iT'd >~C2 ^'^^ preference relations over R such that )~Ci (^"iT-d >~C2 ^'^s strict 
partial orders relative to F and ^f ^c^^ then for all instances r G Sat{F): 



Proof. Theorem 6.1 in [9| is a similar result which does not, however, relativize 
the properties of the given preference relations to the set of instances satisfying 
the given integrity constraints. The proof of that result can be easily adapted 
here. ■ 

Note that in the case of strict partial orders. Theorem 2 implies one direction 
of Theorem 1. 
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5 Weak orders 



We have defined weak orders as negatively transitive strict partial orders. 
Equivalently, they can be defined as strict partial orders for which the indif- 
ference relation is transitive. Intuitively, a weak order consists of a number 
(perhaps infinite) of linearly ordered layers. In each layer, all the elements are 
mutually indifferent and they are all above all the elements in lower layers. 

Example 7 In the preference relation >~Ci Example 3, the first, second 
and third tuples are indifferent with the fourth and fifth tuples. However, the 
first tuple is preferred to the second, violating the transitivity of indifference. 
Therefore, the preference relation 'r-Ci is not a weak order. 

Example 8 A preference relation >~c'f, defined as 

xycfy = f{x) > f{y) 

for some real-valued function f , is a weak order but not necessarily a total 
order. 

5.1 Computing winnow 

Various algorithms for evaluating winnow have been proposed in the literature. 
We discuss here those that have a good blocking behavior and thus are capable 
of efficiently processing very large data sets. 

We first review BNL (Figure 3), a basic algorithm for evaluating winnow, and 
then show that for preference relations that are weak orders a much simpler 
and more efficient algorithm is possible. BNL was proposed in |^ in the context 
of skyline queries. However, P also noted that the algorithm requires only the 
properties of strict partial orders. BNL uses a fixed amount of main memory (a 
window). It also needs a temporary table for the tuples whose status cannot be 
determined in the current pass, because the available amount of main memory 
is limited. 

BNL keeps in the window the best tuples discovered so far (some of such 
tuples may also be in the temporary table). All the tuples in the window are 
mutually indifferent and they all need to be kept, since each may turn out 
to dominate some input tuple arriving later. For weak orders, however, if a 
tuple ti dominates ^2, then any tuple indifferent to ti will also dominate ^2- 
In this case, indifference is an equivalence relation, and thus it is enough to 
keep in main memory only a single tuple top from the top equivalence class. In 
addition, one has to keep track of all members of that class (called the current 
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(1) 


clear the window W and the temporary table F; 


(2) 


make r the input; 


(3) 


repeat the following until the input is empty: 




(a) for every tuple t in the input: 




• t is dominated by a tuple in W =^ ignore t, 




• t dominates some tuples inW ^ eliminate the dom- 




inated tuples and insert t into W, 




• if t and all tuples in W are mutually indifferent ^ 




' J_ J. • J_ TTT" /'J? J_l • \ J_l ' 1 1 J. 

msert t mto W {it there is room), otherwise add t 




to F; 




(b) output the tuples from W that were added there when F 




was empty, 




(c) make F the input, clear the temporary table. 




Figure 3. BNL: Blocked Nested Loops 



bucket B), since they may have to be returned as the result of the winnow. 
Those ideas are behind a new algorithm WWO (Winnow for Weak Orders) , 
shown in Figure 4. 



(1) 


top := the first input tuple 


(2) 


B := {top} 


(3) 


for every subsequent tuple t in the input: 




• t is dominated by top => ignore t, 




• t dominates top ^ top := t; B := {t} 




• t and top are indifferent =^ B := BVJ {t} 


(4) 


output B 




Figure 4. WWO: Winnow for Weak Orders 



It is clear that WWO requires only a single pass over the input. It uses addi- 
tional memory (whose size is at most equal to the size of the input) to keep 
track of the current bucket. However, this memory is only written and read 
once, the latter at the end of the execution of the algorithm. Clearly, for weak 
orders WWO is considerably more efficient than BNL. Note that for weak 
orders BNL does not simply reduce to WWO: BNL keeps the mutually in- 
different tuples from the currently top layer in the main memory window (or 
in the temporary table) and compares all of them with the input tuple. The 
latter is clearly superfluous for preference relations that are weak orders. Note 
also that if additional memory is not available, WWO can execute in a small, 
fixed amount of memory by using two passes over the input: in the first, a 
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top tuple is identified, and in the second, all the tuples indifferent to it are 
selected. 



In 2| we proposed SFS, a more efficient variant of BNL for skyline queries. 



in which a presorting step is used. Because sorting may require more than one 
pass over the input, that approach will also be less efficient than WWO for 
weak orders (unless the input is already sorted). 

Even if a preference relation >~c is not a weak order in general, it may be a 
weak order relative to a class of integrity constraints F. In those cases, WWO 
is still applicable. Note that in such a case the original definition of >~c can 
still be used for tuple comparison. 

Example 9 Consider Example 3, this time with the 0-ary FD =^ ISBN. 
(Such a dependency might hold, for example, in a relation resulting from the 
selection aisBN=c for some constant c.) We already know that the preference 
relation >~Ci 'is a strict partial order. Being a weak order relative to this FD 
is captured by the following CCD: 

yiuVi,Pi,i2,V2,P2,i3,V3,P3. 

Book{ii,vi,pi) A Book{i2,V2,P2) A Book{i^,vs,p'i) A 0i ^ 02 

where 

4>i '■ {h 7^ ^2 V pi > P2) A (^2 7^ «3 V P2 > Ps) 

and 

(j)2 ■ (ii 7^ ^3 Vpi > ps). 

We show now that this CCD is entailed by the FD =^ ISBN. Assume this 
is not the case. Then there is an instance of the relation Book consisting of 
tuples (zi,t>i,pi), {i2,V2,P2), and {i3,V3,p3) such (pi is satisfied but (j)2 is not. 
This instance also satisfies the FD, thus ii = 12 = is- We consider the formula 
A -i(/)2 which can be simplified to 

Pi > P2 Ap2 > Ps Api < P3. 

The last formula is unsatisfiable. 



5.2 Collapsing winnow 



We show here that for weak orders consecutive applications of winnow can 
be collapsed to a single one, using prioritized composition. In contrast with 
Theorem 2, here we do not impose any conditions on the relationship between 
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the preference relations involved. Recall that 

(if : Vti, t2. R{ti) A R{t2) h t2. 

Theorem 3 Assume F is a set of integrity constraints over a schema R. If 
>~Ci o^'iT'd >~C2 '^1"^ preference relations over R such that >~Ci is a weak order 
relative to F , then for all instances r G Sat{F): 

Additionally, if >~C2 ^ weak order relative to FU d^^ , then also >~Ci>C2 ^ 
weak order relative to F . 

Proof. Let r G Sat{F). Assume t G a'c2(^Ci('^)) and t ^ co'c'i>C2(^)- Then 
there exists s G r such that s >~Cir»~c2 ^- ^ '^C'l then t ^ ^^Cii^) and 
t ^ uc2{(^Ci{f^))- Otherwise, s ~c'i t and s >~C2 t- If s E cuciif^), then t ^ 
uc2{^^Ci{i")) ■ If s ^ (^Ci{r), then for some s' G r, s' >~Ci s. But then s' >~Ci t 
because >~Ci is a weak order, and consequently t ^ ujc\ (r) . 

In the other direction, assume t G u>c\r>C2{f^) andt ^ uJc2{^Ci{f^))- If ^ ^ ^Cii'f'), 
then for some s G r, s >~Ci t- Thus, s >~Ci>C2 ^ and t ^ uJci:>C2i.''^)- If ^ ^ ^Cil^"), 
then for some s G (^^(^^(r), s ~Ci t and s ;^c'2 ^- Thus again, s >~Cit>C2 ^■ 

The second part of the theorem can be proved in the same way as Proposition 
4.6 in j^. We can require that )^C2 a weak order relative to F U (if\ not 
just to F, because the dependency df^ is guaranteed to hold in ujc^ir). ■ 

We show now how Theorem 3 can be used in query optimization. Consider the 
choice between WWO and BNL in the context of Theorem 3. If both >~Ci and 
>-C2 are weak orders (relative to F), then it is better to evaluate uJc^^C2i.''^) 
than c<Jc2(c<Jci(r)) because the former does not require creating intermediate 
results. In both cases WWO can be used. If >~C2 is a strict partial order but not 
necessarily a weak order (relative to Fj, then in both cases we will have to use 
BNL (Ci > C2 is a strict partial order jll|), so again wci>C2(^) wins. However, 
if (r) is small, it would be better to use WWO to compute ri = uoc^ {r) 
and then compute ujc2i.f1) using BNL. 

Consider now the presence of views. If ujc^{R) is a non-materialized view, then 
the query a;c'2('^Ci(-R)) can be first rewritten as u;c'j|>c2(-R) and then evaluated 
without the need for the nested evaluation of uoc^iR). On the other hand, if 
ujCx{R) is a materialized view V, then it can be used to answer the query 
t^Ci>C2(-R) by computing UJC2{V). 
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5. 3 Further properties 



The list of preference query properties that hold relative to a set of integrity 
constraints does not end with those formulated above. There are other al- 
gebraic properties that hold conditionally fs]. Such properties can often be 
formulated in a more general form using CGDs. 

For example, consider the commutativity of winnow and selection, shows 
the following result: 

Proposition 2 Given a relation schema R, a selection condition Ci over R 
and a preference formula C2 over R, if the formula 

Vtl,t2[(Cl(t2) AC2(tl,t2)) ^Ci(ti)] 

is valid, then for all instances r of R: 



This result can be generalized to hold relative to a set of integrity constraints. 



Theorem 4 Given a relation schema R, a selection condition Ci over R, a 
preference formula C2 over R, and a set of integrity constraints F over R, if 
F h 6?^^'*"^ where 

d2"''' : Vti,t2. R{ti) A R{t2) A Ciit^) A C2(ti,t2) Ci{t,), 
then for all instances r G Sat{F): 



6 Propagation of integrity constraints 



How do we know whether a specific CGD holds in a relation? If this is a 
database relation, then the CGD may be enforced by the DBMS or the ap- 
plication. If the relation is computed, then we need to determine if the CGD 
is preserved in the expression defining the relation. characterize cases 

where functional and join dependencies hold in the results of relational algebra 
expressions. 

We already know that the CGD d^ holds in the result of the winnow uc- 



15 



Winnow returns a subset of a given relation, thus it preserves all the CGDs 
holding in the relation. The following theorem characterizes all the dependen- 
cies holding in the result of winnow. 

Theorem 5 Assume F is a set of CGDs, f a CGD over a schema R, and 
>~c irreflexive preference relation over R. Then F U h / iff for every 
r G Sat{F), uJc{r) E Sat{f). 

Proof. Assume it is not the case that F U (if h /. Thus for some instance 
Tq, Tq G Sat{F U c/f) but Vq ^ Sat{f). Then for all ti,t2 G Vq, ti ~(7 tj, and 
thus ro = uc{ro). Therefore, uJc{ro) ^ Sat{f). In the other direction, assume 
for some tq G Sat{F), uc{ro) ^ Sat{f). Thus ri = uJc{ro) is the instance 
satisfying F U and violating /, which provides a counterexample to the 
entailment of / by F U rff . ■ 

Example 10 Consider Example 3. Thus, the FD ISBN Price holds in 
the result of ujc^, because it is entailed by the CGD rff^ 

df' : Wii,Vi,pi,i2,V2,P2,i3,V3,P3. 

Book{ii,vi,pi) A Book{i2,V2,P2) =^ (h 7^ ^2 Vpi > P2) 

even though it might not hold in the input relation Book. 



7 Computational complexity 

Here we address the computational issues involved in checking the semantic 
properties essential for the semantic optimization of preference queries. We 
have shown that such properties can be formulated in terms of the entailment 
of CGDs. We assume that we are dealing with /^-dependencies for some fixed 
A; > 1. For example, for FDs k = 2. Notice also that all the interesting prop- 
erties studied in this paper, e.g., containment or weak order, can be expressed 
as ^-dependencies for < 3. 

We assume here that the CGDs under consideration are clausal: the constraint 
in the body is a conjunction of atomic constraints and the head consists of 
a disjunction of atomic constraints. All the dependencies that we have found 
useful in the context of semantic optimization of preference queries are clausal. 

7.1 Upper bounds 

0] shows a reduction from the entailment of CGDs to the validity of universal 
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formulas in the constraint theory. The basic idea is simple: the entailment of 
fc- dependencies needs to be considered over relation instances of cardinality at 
most k, and each such instance can be represented by k tuple variables. Each 
dependency / is mapped to a constraint formula cfk{f)- Then the entailment 
of a CGD /o by a set of CGDs F is expressed as the validity of the formula: 

V*.(A c/fc(/))^c/fc(/o), 

or equivalently, as the unsatisfiability of a quantifier-free CNF formula ob- 
tained from its negation. 

The following example illustrates the construction of cfk{f)- 
Example 11 Consider the dependency: 

rf^''^' : Vti, t2. R{ti) A R{t2) A ti h h 
We have that cf2{dQ^'^'^) is equal to 

[Ci(ti,ti) ^ C^ihM)] A [Ci(t2,t2) ^ C^it^M)] 

^[Cl{t^M) C2{tl,t2)] A [Ci(t2,tl) C2{t2,h)] 

which for irreflexive is equivalent to 

[Cl(tl,t2) C2{tut2)] A [Ci(t2,tl) C2{t2,t,)]. 

For a fixed k, the size of cfk{f) is linear in the size of /. Thus we can easily 
characterize the complexity of dependency entailment. 

Theorem 6 Assume F is a set of k-CGDs for a fixed k > 1 over a con- 
straint theory of equality /rational- order constraints, and preference relations 
are defined by ipfs over the same constraint theory. Checking containment, 
dependency propagation, and weak or strict partial order property, relative to 
F, are all in co-NP. 

Proof. Satisfiability of conjunctions of atomic constraints in this constraint 
theory can be checked in polynomial time Thus satisfiability of quantifier- 
free formulas in this constraint theory is in NP. ■ 

What remains now to be shown is that (1) the intractability is, in general, 
unavoidable; and (2) special tractable cases exist. 
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1.2 Lower bounds 

0] show a number of co-NP-completeness results for the entailment problem 
restricted to special classes of CGDs. To adopt those results to the context 
of the semantic properties of preference queries studied in the present paper, 
we need to show that the hardest (co-NP-hard) cases of the entailment can 
be equivalently expressed in terms of such properties. Such an approach is 
adopted in the proofs of Theorems 7 and 8 to characterize the complexity of 
testing redundancy of winnow and propagation of integrity constraints. On 
the other hand, in Theorem 9, a new reduction is introduced for the problem 
of testing the (relative) weak order property. 

Theorem 7 Checking whether uq is redundant relative to F, where F is a 
set of 2-CGDs and C is a rational- order ipf defining a strict partial order, is 
co-NP-hard. 

Proof. We adapt the proof of Theorem 4.3 in The reduction there is from 
SET SPLITTING but the same reduction applies to MONOTONE 3-SAT. 
Assume we are given a propositional formula with n variables pi, ■ ■ ■ ,Pn, 
consisting of / positive clauses of the form Ch = Pi \/ Pj V Pm, h = 1, . . . , /, 
and k negative clauses Ch = ^Pi V -^pj V ^Pmi h = 1, . . . ,k. We consider a 
relation R with n + k -\- 1 attributes. The truth of a propositional variable Pi 
is represented by equality on the attribute i. We build the set F of 2-CGDs 
in stages. A positive clause pi V pj V pm is mapped to a CGD 

Vti,t2. Riti) A Rit2) A ti[i] ^ t2[2] A ti[j] ^ t2[j] ^ ti[m] = t2[m]. 

The construction for a negative clause Ch = ^Pi V ^pj V ^Pm is more compli- 
cated. We construct the following FDs: 

Vti,t2. R{ti) A R{t2) Mi[i] = t2[i] Mi[n + h] =t2[n + h] 

^ti[n + k + l] = t2[n + k + l\, 
yti,t2. R{ti) A R{t2) A ti[j] = t2[j] A ti[m] = t2[m] ti[n + h] = t2[n + h]. 

Finally, we define the preference relation >~c''- 

ti yc t2 ^ti[n + k + l] > t2[n + k + l]. 

Thus the CGD is 

Vti,t2. R{ti)AR{t2) ^ti[n + k + l] <t2[n + k + l]. 

Along the same lines as in the proof in |^] , we can show that is unsatisfiable 
iff F h d^. m 
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Theorem 8 Checking whether F U \- f , where F is a set of 2-CGDs and 
C is a rational- order ipf defining a strict partial order, is co-NP-hard. 

Proof. We modify the proof of Theorem 7. We pick one positive clause pi V 
Pj ^ Pm- It is still mapped to a CGD equivalent to the previous one: 

Vti, t2. R{ti) A R{t2) ^ ti[i] = t2[i] V ti[j] = t2[j] V ti[m] = t2[m\ 

but this CGD is now obtained as the special dependency for defined 
as follows 

h h = ti[i\ 7^ t2[i\ A ti[j] 7^ t2[j] A ti[m] > t2[m\. 

The CGD / is: 

/ : Wti,t2. R{ti)AR{t2) ^ ti[n + k+l] =t2[n + k + l]. 

The construction for the remaining positive clauses, as well as all the negative 
clauses, remains the same. ■ 

Theorem 9 Checking whether is a weak order relative to F, where F is 
a set of 3- CCDs and C is an equality ipf defining a strict partial order, is 
co-NP-hard. 

Proof. Reduction from 3-colorabihty. Assume we are given a graph G = {V, E) 
where V — {vi, . . . ,Vn}. We construct the set F consisting of the following 
CCDs: 

Vt. R{t) =^ t[i\ = V t[i] = 1, 

Vi. R{t) ^t[n+l]^lVt[n+l]^2V t[n + 1] = 3, 
yti, t2M- R{ti) A R{t2) A R{t3) Ati[n + 1]^ t2[n + 1] 

Ati[n+l]^t3[n + l]At2[n+l]^t3[n + l]^jiti [i] , t2 [i] , [i] ) , 

where i = 1,. . .n and '-f{x,y,z) is a formula saying that exactly one of x, 
y and z is equal to 1. The last dependency is not clausal but can easily be 
represented as a set of clausal CGDs. Also, for every edge {vi,Vj) G E, we 
include the following CGD: 

yti,t2,t3. R{ti) AR{t2) AR{t3) Ati[n + l]^t2[n+l]Ati[n + l]^t3[n+l] 
At2[n + 1] ^ t3[n + 1] ^ ^ t,[j] V t2[i] ^ i2[j] V igfi] ^ tab']- 

Finally, we define the strict partial order as follows: 

t = + 1] = 1 A i!\n + 1] = 2. 
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Assume now that G is 3-colorable. We construct an instance r = {ti,t2,^3} 
as follows. We will use t^, ^ = • • • , 3, to represent the vertices colored with 
the color k. We make tk[i] = 1 if f j is colored with k\ t}J\i\ = otherwise. We 
make ti[n + 1] = 1, t2[n + 1] = 2 and t3[n + 1] = 3. By construction, r satisfies 
F but >~c on r is not a weak order. Therefore, is not a weak order relative 
to F. 

In the other direction, take an instance r = {^1,^2)^3} satisfying F but such 
that >~c on r is not a weak order. Then t2[?^+l], ^sf'^'+l]} = {1, 2, 3}. 

Then r encodes a 3-coloring for G. ■ 

7.3 Tractable cases 

We obtain our first tractability results by identifying a new case of PTIME en- 
tailment. The case involves the entailment of a CGD over equality constraints 
by a set of FDs. This case was not studied in j^. Note that it is more gen- 
eral than the standard FD entailment because the CGD may contain general 
equality constraints. 

Theorem 10 Let F = {/i, . . . , /„} be a set of FDs and fo a clausal k-CGD 
over equality constraints. Then checking whether F \= /o is in PTIME. 

Proof. The dependency /o is of the form 

Vti. . . .V4. [R{ti) A ■ ■ • A A7(ti, . ..tk)] 7'(ti, • --tk). 

As explained earlier in this section, the entailment F \= f reduces to the 
validity of the formula 

Vti,...,4.(A c/fc(/)) ^ c/fc(/o), 

which is the same as the unsatisfiability of the formula 

(A c/fc(/))A-c/fc(/o). 

We note that for any fd / = X ^ F, cfk{f) is a conjunction E of implications 

A t.,[X] = t,[X] ^ t,[Y] = t,[Y]. 

i,j=l,...,k 

On the other hand, ~ic/fc(/o) is a disjunction of conjunctions 5*1,..., 5"^ of 
atomic equality constraints. Each S'l, 2 = 1, . . . , m, is of the form , . . . , tj^.) A 
ijj{ti^, . . . ,tij where ii,...,ik G {1,...,A;}, (p{ti^, . . . ,ti^) is a conjunction of 
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equalities, and i/j{ti^, . . . ,tif,) is a conjunction of inequalities. Both of those 
conjunctions can be viewed as sets of atomic constraints. 

To determine the satisfiability of the formula E A (5*1 V • • ■ V Sm), we need to 
check whether E A Si is satisfiable for any i = 1, . . . ,m. This can be done by 
essentially propositional Horn reasoning. We encode each equality and inequal- 
ity by a different propositional variable and add Horn clauses representing the 
transitivity, symmetry and refiexivity of equality. Using those clauses together 
with the implications in cfk{f) for / e F, we then derive all the equalities 
implied by those in Si and check whether any of them violates refiexivity or 
conflicts with an inequality in Si. The satisfiability of E A Si can thus be 
determined in polynomial time. ■ 

Corollary 1 Given a set of FDs F and equality ipfs Ci (in DNF) and C2 (in 
CNF), the following properties can he checked in PTIME: 

(1) the containment of in >-C2 relative to F, and 
('i) !^Ci being a weak or strict partial order relative to F . 

The requirement that the formulas Ci and C2 be in an appropriate normal 
form guarantees that the dependencies (i^^'*"^ and rff^ are representable using 
polynomially many clausal CGDs. 

We obtain here further tractable cases of the semantic properties studied in 
the present paper by adapting the results of 0]. That paper identifies several 
classes of CGDs for which the entailment problem is tractable. 

The restrictions we impose on CGDs and preference formulas may be of the 
following kinds: 

• the atomic constraints should be typed] 

• the number of atomic constraints should be bounded; 

• the width and the span of preference formulas, defined below, should be 
restricted. 

Definition 10 A constraint formula C{ti, . . . ,tn) over tuple variables ti, . . . ,t, 
is typed if all its atomic subformulas are of the form ti[A]9tj[A] or ti[A]9c, 
where A is an attribute, c a constant, and 9 G {=, 7^, <, >, <, >}. A CGD is 
typed if all its constraints are typed. 

The size of a preference formula C (over a relation R) in DNF is characterized 
by two parameters: width{C) - the number of disjuncts in C, and span{C) 
- the maximum number of conjuncts in a disjunct of C. Namely, if C = 
DiV- ■ -yDm, and each Di = Cj^i A- • • Ci^ki, then width{C) = m and span{C) = 
max{ki, km}- 
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Consider first the containment problem. To check whether ]^Ci ^C2^ "we 
need to determine whether F h (ig ^ ' ^ where 

To obtain tractabihty we need to impose simultaneous restrictions on F, )-Cii 
and )^C2- 

Theorem 11 Let F he a set of typed clausal 2-CGDs with two atomic con- 
straints over a schema R, and Ci and C2 typed preference formulas over the 
same schema and the same constraint theory (either equality or rational or- 
der). Moreover, none of F, Ci, and C2 contains constants. Then 

• checking whether >~Ci >~C2 can be done in PTIME if span (Ci) < 1 and 
width{C2) < I, and 

• checking whether >~Ci ^f ^Faise can be done in PTIME if span (Ci) < 2. 

Note that, for example, unary FDs are typed 2-CGDs with two atomic equality 
constraints. 

Consider now the problem of propagating integrity constraints. 

Theorem 12 Let F be a set of clausal k-CGDs, f a clausal k-CGD and C a 
preference formula over the same schema, and none of F , f , and C contains 
constants. Then checking whether F U h / can be done in PTIME if: 

• F , f, and -iC have at most one atomic constraint each, or 

• k = 2, and F, f and -iC are typed and contain each at most two atomic 
constraints over the same constraint theory ( either equality or rational or- 
der). 

The results of cannot be applied to identify tractable cases of the weak order 
or the strict partial order property, because those properties are formulated 
using 3-CGDs with three or more atomic constraints. Such CCDs do not fall 
into any of the tractable classes of 



8 Related work 



The basic reference for semantic query optimization is |6[. The most com- 
mon techniques are: join elimination/introductioUjpredicate elimination and 
introduction, and detecting an empty answer set. [7| discusses the implemen- 
tation of predicate introduction and join elimination in an industrial query 
optimizer. Semantic query optimization techniques for relational queries are 
studied in l3J| in the context of denial and referential constraints, and in 13C 
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in the context of constraint tuple- generating dependencies (a generalization 
of CGDs and classical relational dependencies). FDs are used for reasoning 
about sort orders in ISi 



Two different approaches to preference queries have been pursued in the liter- 
ature: qualitative and quantitative. In the qualitative approach, represented by 



|29l l20l . 1271 15|, lly, |8|, |9|, ll9|, |21|, |23[ , the preferences between tuples in the answer 
to a query are specified directly, typically using binary preference relations. In 
the quantitative approach, as represented by |l8| , preferences are specified 
indirectly using scoring functions that associate a numeric score with every 
tuple of the query answer. Then a tuple ti is preferred to a tuple t2 iff the score 
of ti is higher than the score of t2- The qualitative approach is strictly more 
general than the quantitative one, since one can define preference relations in 
terms of scoring functions However, not every intuitively plausible preference 
relation can be captured by scoring functions. 

Example 12 There is no scoring function that captures the preference rela- 
tion described in Example 1. Since there is no preference defined between any 
of the first three tuples and the fourth one, the score of the fourth tuple should 
be equal to all of the scores of the first three tuples. But this implies that the 
scores of the first three tuples are the same, which is not possible since the 
second tuple is preferred to the first one which in turn is preferred to the third 
one. 

Example 13 Another common example of a preference relation that is not 
representable using a utility function is the threshold of detectable difference 
relation >~t: 

x>~ty = x>y + c 
where c is the threshold value (c> 0). 

This lack of exp ressiveness of the quantitative approach is well known in utility 
theory IJ, |l3[. The importance of weak orders in this context comes from 
the fact that only weak orders can be represented using real-valued scoring 
functions (and for countable domains this is also a sufficient condition for the 
existence of such a representation ll3|]). However, even if a utility function is 
known to exist, its definition may be non- exphcit H and thus unusable in 
the context of database queries. In the present paper we do not assume that 
preference relations are weak orders. We only characterize a condition under 
which preference relations become weak orders relative to a set of integrity 
constraints. In such cases, we can exploit the benefits of a preference relation 
being a (relative) weak order, for example the possibility of using WWO for 
computing winnow, without a need to construct a specific utility function 
representing the preference relation. 

Algebraic optimization of preference queries is discussed in the papers 0, |2ll 
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9 Conclusions and further work 



We have presented several novel techniques for semantic optimization of pref- 
erence queries, focusing on the winnow operator. We characterized the neces- 
sary and sufficient conditions for the applications of those techniques in terms 
of the entailment of constraint-generating dependencies. (This idea was sug- 
gested but not fully developed in As a consequence, we were able to 
leverage some of the computational complexity results from Moreover, we 
proved here several new complexity results: Theorems 9 and 10. Theorem 3 
is also completely new. Other results are reformulations of those presented in 

0. 

The simplicity of our results attests to the power of logical formulation of 
preference relations. However, our results are applicable not only to the orig- 
inal logical framework of IM | 9| , but also to preference quer ies defined using 
preference constructors |l9ll23l| and skyline queries [1, [l^lH, EH because such 
queries can be expressed using preference formulas. 

The ideas presented in this paper in the context of winnow can be adapted 
to other preference-related operators. For example, ranking [9] associates with 
each tuple in a relation its rank. The best tuples (computed by winnow) have 
rank 1, the second-best tuples have rank 2, etc. The algorithm WWO can be 
extended to compute ranking instead of winnow, and thus the computation 
of ranking will also benefit if the given preference relation is a weak order 
relative to the given integrity constraints. 

Further work can address, for example, the following issues: 

• identifying other semantic optimization techniques for preference queries, 

• expanding the class of integrity constraints by considering, e.g., tuple-generating 
dependencies and referential integrity constraints, 

• deriving further tractable cases of (relative) containment and satisfaction of 
order axioms, 

• studying the preservation of general constrai nt-g enerating dependencies by 
relation algebra operators and expressions (j25| consider this problem for 
functional and join dependencies); 

• identifying weaker but easier to check sufficient conditions for the applica- 
tion of our techniques. 
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